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A  NOTE  ON  SPARSE  QUASI-NEWTON  METHODS 


by 

Mukund  Thapa 


1.  Introduction 

Consider  the  unconstrained  minimization  problem 

Min  f(x)  (1.0) 

xe»n 

An  important  class  of  algorithms  used  to  solve  the  above  problem  is  that 
of  Quasi-Newton  algorithms  [1].  The  idea  of  these  methods  Is  to 
maintain  a  positive  definite  symmetric  matrix  that  approximates  the 
Hessian  at  each  iteration.  Given  the  point  x^  In  Rn  ,  the  algorithm 
obtains  a  direction  of  descent,  p^,  by  solving  the  system  of  equations 

Bk  pk  *  ”8k  ’  (1,1) 

where  is  the  approximation  to  the  Hessian  at  Iteration  k  and 

is  the  gradient  at  x^.  The  next  point,  Xj^*  is  then  set  to 
xk  +  “k  pk  w^ere  °V  chosen  to  cause  a  "sufficient"  decrease  In 
the  function  value  at  x^.  If  the  new  point,  x^.,.^  satisfies  some 
convergence  criteria,  the  algorithm  is  terminated;  else,  the  above 
procedure  is  repeated  after  obtaining  ,  a  new  approximation  to 

the  Hessian,  as  follows: 


(1.2) 


k+1 


Bk  + 


where  is  a  matrix  chosen  so  that  is  symmetric,  positive 

definite  and  satisfies  the  Quasi-Newton  condition  (henceforth  referred 
to  as  the  QN  condition). 


with 


Bk+1  Sk 


rk  ‘ 


(1.3) 


sk  *  Vi  -  *k  *  and  yk  “  8k+l  "  8k  • 


There  are  a  number  of  different  ways  of  choosing  Uk  in 
equation  (1.2).  Three  possible  choices  are  shown  below. 


BFGS  Update: 


(,BFGS 


yk  yk 


T  Bk8k\Bk 


Sk  yk 


8k  \  sk 


(1.4) 


DFP  Update: 


,,DFP 


(yk  '  Bk  Sk)yk  +  yk(yk  ~  Bk  Sk)T 
T 

yk  8k 


,  „  ,T  T 

(yk  “  Bk  sfc)  sk  yfc  yk 


T  2 

(yk  V 


(1.5) 


Self -Scaling  BFGS:  B. 


k+1 


Bk  *k  8k  M 

T 

|  8k\ 

B.  s.  J 

1  s,T  B.  8,. 

ykyk 

T 

8k  yk 


(1.6) 
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Quasi-Newton  methods  have  been  very  successful  in  solving 
unconstrained  and  constrained  problems  of  moderate  size.  The  difficulty 
in  applying  these  methods  to  large  problems  is  that  a  symmetric  n  x  n 
matrix  (or  a  factorization)  must  be  stored.  However,  many  large  pro¬ 
blems  have  a  sparse  Hessian  whose  sparsity  pattern  is  known  (or  can  be 
determined)  a  priori.  In  this  case,  it  seems  possible  to  maintain  a 
suitably  sparse  approximation  to  the  Hessian;  and,  much  current  research 
is  being  directed  to  this  objective  (see  [2] , [3] , [4], [5]) . 

Updates  of  the  type  given  by  equations  (1.4),  (1.5)  and  (1.6) 
cause  total  fill-in  (that  is,  they  do  not  preserve  any  zeros  of  the 
Hessian  approximation).  Obtaining  updates  that  preserve  sparsity  and 
satisfy  the  Quasi-Newton  condition  (1.3)  requires  the  solution  of  a 
linear  system  of  equations  whose  coefficient  matrix  has  the  same 
sparsity  pattern  as  the  Hessian.  This  does  not  guarantee  positive 
definiteness;  and,  in  fact,  it  is  not  possible  to  always  satisfy  the 
Quasi-Newton  condition  (1.3)  and  preserve  positive  definiteness  while 
maintaining  sparsity  (see  [3],  for  example).  Furthermore,  sparse 
updates  are  usually  of  rank  n;  and,  hence  it  is  not  possible  to  easily 
update  the  factorization  of  the  Hessian  approximation.  This  results 
in  the  additional  work  of  refactorizing  the  Hessian  at  each  iteration. 

Shanno  [3]  showed  how  the  sparse  analog  of  any  symmetric  update 
can  be  derived  by  variational  means .  This  paper  shows  how  these 
sparse  analogs  can  be  derived  as  a  simple  extension  of  Toint's  deriva¬ 
tion  of  a  sparse  update. 
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2.  Definitions  and  Notation 

In  the  rest  of  the  paper  the  subscript  k  will  be  dropped  and 
the  subscript  k  +  1  will  be  replaced  by  the  superscript  *. 

Let  B  be  the  sparse  symmetric  matrix  representing  the  approxi¬ 
mation  to  the  Hessian  at  the  start  of  iteration  k  . 

Let  N  ■  {(i,j):  ■  0}  that  is,  N  represents  the  sparsity 

pattern  assumed  at  the  start  of  the  algorithm.  Note  that  the  sparsity 
pattern  is  assumed  to  be  fixed  and  any  additional  zeros  created  are 
treated  as  non-zeros. 

Let 


N  -  { (i, j) :  i,j  -  1 . n}\N 

-  Ui,j>:  i  0}  • 


For  any  symnetrlc  matrix  A ,  define  matrices  A^  and  A^  as 
follows: 


(i,J)€N 


(AS) 


ij 


0 


(i,J)€N 
(i,j)  eN 


In  words,  is  Che  matrix  A  with  zeros  in  Che  positions  correspond¬ 
ing  to  the  non-zeros  of  B;  and  A^j  is  the  matrix  A  with  zeros  in 
the  positions  corresponding  to  the  zeros  of  B.  Then  A  can  be  written 


as 


A  -  *N  +  Afc  . 

Define  to  be  a  diagonal  matrix  whose  diagonal  elements  are 

0  or  1  depending  on  the  sparsity  pattern  of  the  itl1  row  of  B. 

That  is. 


‘Vm 


i  if  (i,j)e5 

o  if  (i,j)SN  . 


Finally,  define  s*  »  s  for  any  vector  s. 

An  example  that  illustrates  the  above  definitions  and  notations 
now  follows. 

Example : 
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3.  Tolnt^g  Method 

"  « 

ToinC  [2]  proposed  finding  a  matrix  E  such  that:  E  is  closest 
* 

to  B  in  some  sense;  B  (■  B  +  E)  has  the  same  sparsity  pattern  as 
B  (thus,  E  has  the  same  sparsity  pattern  as  B) ;  and  B*  satisfies 
the  Quasi-Newton  condition  (1.3).  Formally,  the  problem  can  be  stated 
as: 

2  n  n 

(PI)  Min  IeI  =  7  J  E . .  ,  where  1*1  is  the  Frobenius  norm  (3.0) 

F  i-1  j=l  ij  F 

such  that  Es  “  y  -  Bs  (3.1) 


E^  -  0  (i,j)  6  N  (3.2) 

E  -  ET  •  (3.3) 


By  variational  means,  Toint  obtained  the  following  result 


(i,j)eN 


EU  * 


(3.4) 


\±  Sj  +  Aj  Si 


where  A  *  (Ap  ...,  An)  is  the  solution  of  the  linear  system 


<P A  -  y  -  Bs  (»  Es) 


(3.5) 
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with  <P  defined  by 


-  (81)j(sJ)i  +  Is1! 2  V  i,j  (3.6) 

and  is  the  Kronecker  delta. 

Note  that  is  symmetric  and  has  the  same  sparsity  pattern  as 
B.  Furthermore,  V7  is  positive  definite  if  and  only  if  Is ^1  >  0 
for  all  1  (see  Tolnt  [2]). 

In  matrix  notation, 

E  -  £  X.[e  (sV  +  s1  e)  ,  (3.7) 

1*1  1  1 

where  e^  is  the  unit  vector  with  1  in  the  1th  position,  and 

V  -  l  [(Sj)s.  +  lsjlj  e  ]ej  .  (3.8) 

j-1  2  2  2 

Toint  also  obtained  a  generalization  by  minimizing  IWEwBp  where 
W  is  a  diagonal  matrix  given  by 
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In  this  case  the  <P  and  E  matrices  are  defined 


by 


«>  ,  ?  (,X, 


(3.10) 


Elj  "  ti1tj[Ai^sl)j  +  xJ(sj)1]  • 


(3.11) 
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4.  Sparse  Analogs  of  Symmetric  Updates 

Shanno  [3]  showed  how  sparse  analogs  of  symmetric  updates  (using 
BFGS  as  an  example)  could  be  derived  by  variational  means.  This  section 
shows  how  these  sparse  analogs  and  those  using  self-scaling  can  be 
derived  as  a  simple  extension  of  Toint's  results. 

Let  B*  -  r|B  +  U,  where  U  is  symmetric  but  in  general  will  not 

have  the  same  sparsity  pattern  as  B;  n  is  some  scale  factor;  and 

* 

B  s  *  y.  Then,  by  definition  we  have 


* 


(4.0) 


* 


nBN 


+  UN 


(Note  that  B-  *  B) 
N 


(4.1) 


Now  B-  has  the  same  sparsity  pattern  as  B  but  does  not  satisfy  the 
Quasi-Newton  condition  (1.3).  Hence,  we  want  to  find  a  B  given  by 


B  =  B-  +  E  ,  (4.2) 

„  * 

such  that  B  is  symmetric,  has  the  same  sparsity  pattern  as  B  and 
satisfies  the  Quasi-Newton  condition  (1.3). 

Next,  note  that 


B  s  -  (B*  +  E)s 
N 

-  (B*  -  Bjj  +  E)s 

“  y  "  (bn  "  E)s  ' 
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Clearly,  B  s  ■  y  If  and  only  If  (B^  -  E)s  ■  0  or 


Es 


* 


s 


»* 

Thus  B  Is  obtained  by  solving  the  following  problem 


(P2)  Min  lElp-  l  l  E2 

1-1  j-1 

such  that  Es  -  B*  s 

Etj  -  0  (i,j)  G  N 

T 

E  «  E 

Problem  P2  Is  almost  the  same  as  problem  PI.  The  only  difference 
Is  in  equation  (4.5)  of  P2  and  equation  (3.1)  of  PI.  Thus  the  solution 
to  problem  P2  Is: 

0  (i,j)€N 

\  +  Aj  b±  (i,j)€N 


where  A  -  (A. ,  . . . , 


A  )  is  the  solution  of  the  linear  system 


4>A  -  Bj  s  (-  Es) 

with  <P  defined  by  (3.6)  or  (3.8). 

If  the  norm  to  be  minimized  is  chosen  to  be  IWEWl^.  with  W 
given  by  (3.9),  then  E  and  <P  are  given  by  (3.10)  and  (3.11) 
respectively. 
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5.  A  Note  on  Computations 

Shanno  [3]  indicated  that  the  computation  of  B*s  does  not 
require  the  storage  of  the  elements  of  but  does  require  the 

computation  of  the  elements  of  (that  is,  those  elements  of  U 

corresponding  to  the  zero  elements  of  B) .  However,  the  following 
result  shows  that  the  elements  of  need  not  be  computed. 

B*  s  *  U  s  (from  (4.0)) 

Ci  N 

=  (U  -  U-)s  (by  definition  of  UN) 


Us  -  U-  s 
N 


(B*  -  nB)s  -  U-  s  (since  B*=qB  +  U) 

N 


y  -  nBs  -  U-  s 


6.  Conclusion 

This  paper  has  shown  how  the  sparse  analogs  of  Quasi-Newton 
updates  can  be  derived  as  a  simple  extension  of  Toint's  results;  and, 
how  the  computation  of  B*  s  can  be  done  efficiently.  At  present, 
research  on  the  computational  and  theoretical  aspects  of  sparse  Quasi- 
Newton  algorithms  is  continuing,  and  further  results  will  be  described 
in  a  later  technical  report. 
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